Knowing a network by walking on it: emergence of scaling 
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A model for growing networks is introduced, having as a main ingredient that new nodes are 
attached to the network through one existing node and then explore the network through the 
links of the visited nodes. From exact calculations of two limiting cases and numerical simulations 
the phase diagram of the model is obtained. In the stationary limit, large network sizes, a phase 
transition from a network with finite average connectivity to a network with a power law distribution 
of connectivities, with no finite average, is found. Results are compared with measurements on real 
networks. 
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A network is composed by a set of nodes and a set 
of links among then. The topological properties of dis- 
ordered networks have been studied for a long time. A 
well known example is the work of Erdos and Renyi |l]] 
where a network generated by placing links among the 
nodes at random is studied. However such a model is 
not able to describe the topological properties of real 
complex networks. The main current studies are thus 
focused in finding the mechanism which generates such 
networks ^ [|. 

Watts and Strogatz [Q] introduced the " small- world" 
network model, an interpolation between regular lattices 
and random graphs. Different social, biological and eco- 
nomic networks has been found to be well described by 
such approach (2||j|. Such a model is more appropriate 
for networks where the number of nodes remains con- 
stant. Moreover, it yields a distribution of connectivities 
P(k) peaked around a characteristic value ||. 

On the other hand, some authors have studied the 
topological properties of networks generated by evolu- 
tionary dynamics pj^. In this case the network topol- 
ogy is changed using extremal dynamics rules inspired in 
the Bak-Sneppen model for biological evolution Mj. A 
model with fixed j(| and variable Rj number of nodes has 
been proposed. The study of the topological properties 
of the second model reveals that the distribution of con- 
nectivities changes in time, yielding either exponential or 
power law distributions. 

Finally, there is a class of growing-network models 
where the addition of new nodes leads to scale-free struc- 
tures [||. In this case the connectivity distribution fol- 
lows a power law decay P{k) ~ fc~ 7 (2 < 7 < 3). Ex- 
amples are the World Wide Web (WWW) where HTML 
documents are the nodes and the links to other docu- 
ments in the WWW are the links |) 1 1 1 ; and the citation 
of scientific publications where papers are the node and 
citations among them are the links (l2|. In these two 
examples the number of nodes (HTML documents or pa- 



pers) is clearly increasing in time. Here the attention is 
focused in this class of networks. 

Different points of view appear when describing the 
evolution in time of the set of links, which is actually the 
mechanism introducing randomness in scale-free models. 
In the approach by Huberman and Adamic |^] the num- 
ber of links pointing to a node is a random fraction of the 
number of links which are already pointing to that node. 
On the other hand, Barabasi and Albert || have pro- 
posed a preferential attachment, where new nodes are 
linked with higher probability to those existing nodes 
with have higher connectivity. This model has been re- 
cently shown to be a particular case of a model proposed 
by Simon in the fifties |l3) . The study of this class of 
growing networks is currently very active and new vari- 
ants have been proposed , keeping the preferential 
attachment as main ingredient. 

However, these growing-network models do not take 
into account one fundamental property of real networks, 
the fact that a new node does not have "knowledge" of 
the entire network. For instance, when a scientist is writ- 
ing a manuscript he does not know all the already pub- 
lished papers which may have certain relation with the 
subject he is dealing with. In fact he only knows a few 
number of papers and through the references appearing 
on them he found new ones, and continues his search re- 
cursively using the new references on them. Thus, the 
model introduced in this paper is based on the fact that 
we know the network, or at least part of it, by "walk- 
ing" on it. This feature together with growing yield the 
scaling behavior observed in real growing networks. 

The model is defined by giving an initial condition and 
a set of evolution rules. Initial condition: one starts with 
one node N = 1 and an empty set of links. The evolution 
rules are divided in adding a node and walking through 
the network. Adding: A new node TV + 1 is created with 
a link to one existing node selected at random. Walking: 
the new node "walks" through all the nodes pointed by 
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the selected node and create a link to them with a prob- 
ability p. This last rule is repeated recursively with the 
new selected links. When no new link is created add a 
new node. 

One run of this algorithm for p = 0.5 and up to N = 5 
nodes is shown in Fig. |l|. N = 1: a node is created (node 
1). N — 2: a second node (node 2) is created which can 
only point to node 1. N = 3: node 3 is created and it can 
point either to node 1 or 2. In the particular case shown 
in Fig. [I] it points to node 1 . Since node 1 does not has 
any link the rule stops. N = 4: node 4 is created which 
can point to either node 1,2 and 3. In this case it points 
to node 2. Now node 2 has a link to node 1 so with prob- 
ability p node 4 creates a link to node 1 (it is not created 
in this case). N — 5: node 5 is created which can point 
to either node 1,2,3 and 4. In this case it points to node 
4. Since 4 has a link to node 2 node 5 will create a link 
to node 2 with probability p (it is created in this case). 
But now node 2 has a link to node 1 so with probability 
p node 5 creates a link to node 1 (it is not created in this 
case). And so on. 

The main assumptions of this model is that one has 
the first contact with the network through one node and 
then explores the rest of by "walking" through the di- 
rected links. Moreover, there is a time scale separation 
between the addition of nodes and the mechanism of cre- 
ation of new links. The network is clearly a directed 
graph and between two nodes there can be only one link, 
which goes from one to the other. The only parameter of 
the model is p which may have different interpretations 
according to the particular problem one is modeling. For 
instance, in the problem of citations p is the fraction of 
papers appearing in the list of references of one paper 
which may be of our interest. 

Let us now investigate the evolution of the connectivity 
distribution as N grows. Here the connectivity is defined 
as the number of links pointing to a node (in-degree). 
When a new node is added to the network the connectiv- 
ity of any node already at the network remains constant 
or increases by one. For instance, in Fig. |l|, from N = 3 
to 4 the connectivity of node 2 increases by one while 
that of the other nodes remain constant. Moreover, the 
created node has connectivity k = 

Let w(k, N) be the probability that when adding the 
N + 1 node the connectivity of a node with connectivity 
k increases by one. With this definition, the number of 
nodes n(k, N) with connectivity k evolves according to 
the set of equations 

n(Q, N + 1) = n(0, N) + l- w(0, N)n(0, N), (1) 

n(k, N + 1) = n(k, N) + w(k - l,N)n(k -1,N)- 

-w(k, N)n(k, N), for k > 0. (2) 

For N ^> 1 one can look for stationary solutions of this 
set equations. In this limit w(k, N) should be of the form 



w(k, N) — W(k)/N, where 1/N comes from the fact that 
the new node is attached to an existing node selected at 
random, which happens with probability 1/N (this will 
be demonstrated below for two limiting cases). Then, 
taking into account that n(k, N) = NP(k, N) and the 
stationary condition P(k, N + 1) = P(k, N) = P(k) one 
obtains 

P(0)=W(0)/2,. (3) 

P(k) = W{k-l)/[l + W(k)], forfc>0. (4) 

Thus, determining W(k) one can iterate (^) to obtain the 
stationary distribution P{k). 

A node with connectivity k — can only increase its 
connectivity if the new node is attached to it, which hap- 
pens with probability 1/N. Hence, w(0,N) = 1/N and, 
therefore, W(0) = 1. From this result and (^) it follows 
that P(0) = 1/2. This result is independent of the value 
of p, i.e. in the present model half of the nodes have no 
links pointing to them. 

The form of w(k, N) for k > is not known. Here 
only the limiting cases p = and p = 1 are solved ex- 
actly. For p = 0, independent of the connectivity of a 
node, the probability that its connectivity increases by 
one is w(k,N) = 1/N, which is just the probability that 
the new node is attached to it. Hence, W(k) — 1 inde- 
pendent of k. Substituting this result in (|J) and iterating 
with the initial condition P(Q) = 1/2 it results that 

P(k) = 2~ (fe+1) , forp = 0. (5) 

In the other limit, p = 1, a node will increase its connec- 
tivity either if the new node is attached to it or to one of 
the nodes with a link to it, i.e. w(k, N) = (1 + k)/N. In 
this case after iteration of (||) one obtains 

P(k) = [(fc + l)(fc + 2)]-\ farp=l. (6) 

Notice that for p = 1 since w(k,N) = (1 + k)/N 
there is a preferential attachment to nodes with larger 
connectivity. This is one of the main ingredients in- 
troduced by Barabasi and Albert to obtain the desired 
emergence of scaling ||. Actually in their model w(k, N) 
is also linear in k/N. However, the preferential attach- 
ment is imposed while here, on the contrary, it appears 
sclf-consistently from the dynamics of the network. The 
evolution rules of the model do not show us a priori 
the existence of a preferential attachment but it is clear 
that nodes with larger connectivity becomes more visible 
when one "walks" through the network. 

These limiting cases are described by distributions 
which are qualitative different. For p — the distribu- 
tion is exponential with a finite average connectivity. On 
the contrary for p — 1 the distribution is the power law 
decay P{k) ~ fc -2 for large k. This power law decay goes 
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up to the largest possible connectivity k = N — 1 while 
P(k, N) = for k > N. Then for large N the average 
connectivity scale as 



(k) = A + \nN, 



(7) 



where A is independent of N. The average connectivity 
thus diverges when N — > oo. 

Since the limiting cases p = and p = 1 give qualita- 
tive different behaviors there should be certain probabil- 
ity threshold p c where a transition from a network with 
finite average connectivity to a free-scale network takes 
place. In the absence of analytical results for < p < 1 
numerical simulations are performed in order to explore 
this part of the phase diagram. 

The maximum network size reached was 81920 nodes 
and average was taken over 100 runs of the algorithm 
which generates the network, for each value of p reported 
here. The resulting connectivity distribution is shown in 
Fig. 0. The first thing to be notice is that the analyt- 
ical results in (||) and (H) for the limiting cases p = 
and p = 1, respectively, are in very good agreement with 
the numerical data. Second, the transition from a finite 
average distribution to a power law takes place at an in- 
termediate probability < p c < 1, where p c is in the 
neighborhood of 0.4. 

To obtain a more precise estimate of the threshold the 
scaling of the average connectivity (k) with TV was inves- 
tigated. For p < p c it was found to saturate to a finite 
value when N 3> 1 while for p > p c it grows logarithmi- 
cally with N as in the limiting case p — 1 . The results for 
p = 0.1 and p — 0.9 are shown in Fig. ^. In the neighbor- 
hood of the threshold the network sizes needed to reach 
the stationary state are not accessible by the present nu- 
merical results. Thus, the following approximate method 
was used to determine p c . 

The numerical data was fitted by the parabola 
(k){N) = a + bx + cx 2 , where x — \ogN and a, b and 
c are fitting parameters. For p > p c the parameter c is 
expected to be zero while for p < p c it is negative, as a 
consequence of the tendency of (k) (JV) to saturates to the 
stationary value. Actually due to numerical errors c will 
never be exactly zero. Here p c is estimated by the value 
of p at which c changes sign, becoming either zero or pos- 
itive. Using this criteria it results that p c — 0.39 ± 0.01. 

Let us now focus our attention in the form of P{k). In 
Fig. |^ it can be seen that for p <p c the shape of the con- 
nectivity distribution depends on p. For p > p c , although 
there are some deviations for k small, the large k behav- 
ior is characterized by the power law decay P(k) ~ k 1 
with an exponent 7 = 2.0 ± 0.1. Thus, above the thresh- 
old, the connectivity distribution of the network is very 
robust, showing little variations when p changes. 

These features are very similar to those observed in 
some sandpile models pT^JlS], the paradigm of the the- 



there is a time scale separation, here between the addi- 
tion of new nodes and their " walk" through the network. 
In the thermodynamic limit, large system sizes, the phase 
diagram of the model is divided in a sub-critical and a 
critical region, and in the critical region the power law 
exponent does not depend on the control parameter. All 
these similarities put these models in the same class, with 
a self-organized critical region in the phase diagram. 

There are real complex networks which can be de- 
scribed by the present model. The network of citations 
among papers published in journals is an example. In 
this case k is the number of times a paper is cited in 
other papers. The analysis of the available data yield 
the power law exponent 7 = 3 fl2|] . This value is larger 
that the universal value 7 = 2 observed in the critical 
region of the model introduced here. Thus, it seems that 
citation problem is in some part of the sub-critical region 
below p c . Actually the fraction of papers one usually con- 
siders, to be referred in a future publication, is small in 
comparison with the total amount of referenced papers 
appearing in papers of our knowledge. However, more 
data is needed to reach to a final conclusion. 



In the WWW network HTML documents are the 
nodes, the links to other documents in the WWW are 
the links, and k is the number of times a HTML docu- 
ment appears as a link in other HTML documents. In 
HTML documents the links are created when the HTML 
document is created but can also be changed latter on. 
Hence, the addition of new nodes is not the only mech- 
anism of changing the set of links. However, the rate of 
addition of new HTML documents is actually very high 
so one expect that the addition of new documents is the 
dominant mechanism and, therefore, can be described by 
the present model. Measurements reported in the litera- 
ture 11 1 yield the exponent 7 = 2.1 ±0.1 in very good 
agreement with the exponent 7 — 2 in the critical region. 
Hence, the WWW network is in some part of the critical 
region. 



ory of self-organized criticality |19|. As in these models, 



Thus, with only one control parameter the present 
model is able to describe the form of the connectivity 
distribution of networks with different topologies. For 
< p < p c it describes networks with a finite average 
connectivity, which may have a power law decay for small 
connectivities but with a cutoff independent of the net- 
work size for large connectivities. On the contrary, for 
Pc < P < 1 it describes networks with power law dis- 
tribution of connectivities, up to a cutoff determined by 
the network size. The transition from one behavior to 
the other is determined by the parameter p, which mea- 
sures the probability to create a new link and continue 
the search in the network through the added link. 
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FIG. 1. One run up to 5 nodes of the algorithm with gen- 
erates the network using p = 0.5. The number of nodes in 
the network is indicated in the horizontal axis. Different gray 
levels indicate different connectivities from k — (white) to 
k = 3 (black). Dashes lines indicates that the new node per- 
formed one "walk" before creating this link. 
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FIG. 2. Connectivity distribution for different values of p. 
The points corresponds to, from left to right, p — 0, p — 0.1, 
p = 0.2, p — 0.3, p = 0.4, and p = 1, respectively. The con- 
tinuous lines corresponds with the exact results for p = and 
p — 1 in Eqs. (|B|) and (^|), respectively. 




FIG. 3. Average connectivity as a function of the number of 
nodes N added to the network for p = 0.1 (inset) and p = 0.9 
(full plot), in a semi-log scale. For p — 0.1 the plot clearly 
saturates to a finite value. On the contrary, for p = 0.9 (k) 
grows logarithmically, which is manifested as a straight line 
in the semi-log plot. 
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